Learning Structural Classification Rules for Web-Page Categorization
نویسندگان
چکیده
Content-related metadata plays an important role in the effort of developing intelligent web applications. One of the most established form of providing contentrelated metadata is the assignment of web-pages to content categories. We describe the Spectacle system for classifying individual web pages on the basis of their syntactic structure. This classification requires the specification of classification rules associating common page structures with predefined classes. In this paper, we propose an approach for the automatic acquisition of these classification rules using techniques from inductive logic programming and describe experiments in applying the approach to an existing web-based information system.
منابع مشابه
Machine Learning Methods For Chinese Web Page Categorization
This paper reports our evaluation of k Nearest Neighbor (kNN), Support Vector Machines (SVM), and Adaptive Resonance Associative Map (ARAM) on Chinese web page classi cation. Benchmark experiments based on a Chinese web corpus showed that their predictive performance were roughly comparable although ARAM and kNN slightly outperformed SVM in small categories. In addition, inserting rules into AR...
متن کاملWeb Page Categorization using Multilayer Perceptron with Reduced Features
The web is a huge repository of knowledge and numerous hyperlinks. Web also serves a broad diversity of user communities and global information service centers. Every day the knowledge in web page upwards rapidly. Web pages can be used to convey the knowledge to web users. Such voluminous size of the web makes an intricacy of web information retrieval, web content filtering and web structure mi...
متن کاملA Novel Efficient Classification Algorithm for Search
In this paper a new classification algorithm of Web documents into a set of categories, is proposed. The proposed technique is based on analyzing relationships between different documents and the terms they contain by producing a set of rules relating the category of the document, its terms and their frequencies. Each document is represented by a graph that correlates its most frequent combined...
متن کاملOptimizing Membership Functions using Learning Automata for Fuzzy Association Rule Mining
The Transactions in web data often consist of quantitative data, suggesting that fuzzy set theory can be used to represent such data. The time spent by users on each web page is one type of web data, was regarded as a trapezoidal membership function (TMF) and can be used to evaluate user browsing behavior. The quality of mining fuzzy association rules depends on membership functions and since t...
متن کاملCombining ILP with Semi-supervised Learning for Web Page Categorization
This paper presents a semi-supervised learning algorithm called Iterative-Cross Training (ICT) to solve the Web pages classification problems. We apply Inductive logic programming (ILP) as a strong learner in ICT. The objective of this research is to evaluate the potential of the strong learner in order to boost the performance of the weak learner of ICT. We compare the result with the supervis...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002